---
title: Conversation
description: "Using Cloud Endpoints for Voice Inputs and Text to Speech"
---

This section provides various examples for integrating and using multiple cloud-based AI endpoints, such as OpenAI, DeepSeek, and others, for voice input processing, text-to-speech (TTS) and emotion detection. Whether you need to convert spoken language into text (ASR) or generate natural-sounding speech from text, these examples will help you interact with different cloud providers seamlessly.

## Voice to Text Processing with OpenAI

This example uses your `default` audio in (microphone) and your `default` audio output (speaker). Please test both your microphone and speaker in your system settings to make sure they are connected and working. On a Mac, the system may request permission to on your audio - Allow permissions.

```bash
uv run src/run.py conversation
```

Especially on Linux, such as on Ubuntu 20.04 on the Nvidia Orin, audio support can be marginal. Expect some audio inputs and outputs to not work correctly, or to advertise incorrect hardware capabilities, such as USB microphones that report zero input channels etc.

## Enumerating your Audio

You can enumerate available audio via the test script in `/system_hw_test`:

```bash
python test_audio.py
```

## Testing Audio

You can provide test sentences to speak by adding the `MockInput` to the config file:

```bash
{
    "type": "MockInput",
    "config": {
        "input_name": "Voice Input"
    }
}
```

Then connect to the `ws` (`wscat -c ws://localhost:8765`) and type in the words you want the system to speak. This is useful to debug audio out issues and related settings such as chunk values.
